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Summary. The Bourewa beach site on the Rove Peninsula of Viti Levu is the earliest known 
human settlement in the Fiji Islands. How did the settlement at Bourewa develop in space and 
time? We have radiocarbon dates on sixty specimens, found in association with evidence for 
human presence, taken from pits across the site. Owing to the lack of diagnostic stratigraphy, 
there is no direct archaeological evidence for distinct phases of occupation through the period 
of interest. We give a spatio-temporal analysis of settlement at Bourewa in which the deposition 
rate for dated specimens plays an important role. Spatio-temporal mapping of radiocarbon date 
intensity is confounded by uneven post-depositional thinning. We assume that the confounding 
processes act in such a way that the absence of dates remains informative of zero rate for 
the original deposition process. We model and fit the onset-field, that is, we estimate for each 
location across the site the time at which deposition of datable specimens began. The temporal 
process generating our spatial onset-field is a model of the original settlement dynamics. 
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1. Introduction 

Bourewa beach is an archaeological site on the sout hwest coast o f Viti Levu, the largest 
island in the Fiji Islands Group. It was identified in iNunn et al. ( 2004 ) as a site of likely 



early settlement by surface finds of distinctive early Lapita-era ceramics. In the course of 
the excavation, which ended in February 2009, in excess of one hundred pits, of varying 
sizes, were dug. Material was selected from some of these pits for radiocarbon dating. 
These data, and their observation model, are described in Section [31 The likelihoods for 
the unknown ages of the specimens are plotted in Fig. [TJ The number of dates measured 
in any given pit varies from zero to six. The relative pit locations and the number of 
dated specimens in each pit are shown in Fig. [5J Criteria for selection for dating included 
secure association with cultural remains, for example, human burials and diagnostic forms 
of decorated pottery. The pits were excavated over several years, and dated incrementally, 
under funding constraints which varied through time. There is a need to date across the site 
in three spatial dimensions, so material selected for dating is to some extent deliberately 
spread out. 
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Fig. 1. All Bourewa data: (y-axis) years BP, (x-axis) specimen index, likelihoods (blue) data omitted 
from study, (red) data retained. Two early dates were ajudged insecurely linked to evidence for 
human presence, and the later dates belong to a qualitatively different phase of activity at the site. 
Text (top row) is pit-name and (bottom row) date identifier. 



There is a conjecture, described in iNuniJ (|2007l l2009f ). that a small initial settlement 
in the centre of the beach grew in size to eventually cover the studied site area. We 
formalise this conjecture, and compute the posterior probability that it is true, along with 
other measures of support. We make a spatio-temporal map of the date at which the 
deposition of human-related material began at each point on the site. For each spatial 
location, this "onset-field" map gives the time at which the deposition rate changes from 
zero to some positive value. The inference is conditioned to ensure that the pit locations 
and the distribution of dated specimens across pits are not by themselves informative of 
variations in the time-span of human activity from pit to pit. 

We begin our modeling of the data with a description, in Sectional of a family of prior 
models for the jouit dist r ibution of specimen ages. These temporal mo dels, developed in 
Navlor and SmithI (|l988h : IZeidler et al.l (ll998l) : |Nicholls and JonesI (|200lh and implemented 



in the OxCal software described in Ramsev ( 200l[) . are in widespread use. The dated- 
specimen deposition process is a Cox process in time. Specimen ages are the times for 
events in a Poisson point process, which has a rate which may vary from pit to pit, but is 
constant in time between change points. The site-wide dated-specimen deposition process 
is the sum of independent Poisson processes in each pit (the specimen are all small pieces 
of charcoal and shell and would have been waste material at the time). Change points 
are the boundaries of phases of deposition. In this class of models, the number of change 
points is known from separate evidence. Phase boundaries mark site-wide changes in the 
human activities generating dateable specimens and may be indicated by strata. The phase 
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Fig. 2. The scatter of pit locations, and the number of dated specimens in each pit which were 
retained in the analysis. The box marl<s the edges of the onset-field we will model and has been 
aligned to the data. The sea is adjacent to the upper edge of the box, with the short box axis pointing 
towards the sea, and the long axis pointing along the beach. Axes give excavation coordinates. 



boundaries are modeled as events in a second constant rate Poisson point process. The 
rate parameters (for specimen ages, and phase boundary ages) are not usuaUy physicaUy 
interesting, or easy to model, as the ages of dated specimens are the ages of events in an 
original deposition process thinned by arbitrary specimen selection at excavation, and the 
uneven action of decay. It is not necessary to estimate these rates if we can condition the 
analysis on the number of phases, and the number of dates in each phase. This is possible, 
for example, if phases are strata and a fixed number of specimens are taken temporally at 
random within known strata. 

In our spatio-temporal model, described in Section [5j an onset-time field replaces the 
earliest phase boundary of the temporal phase model of SectionH) This allows the start-time 
for the deposition of dateable material to be a function of position. The phase and dated- 
specimen processes are otherwise unchanged. The onset-field process is a two-parameter 
spatio-temp oral process on a rectangular lattice, and is related to the family of models 
described in Richardson One parameter of the onset-field process controls the rate 

at which an unoccupied cell is occupied by "immigration from off-site" and the other the 
rate for cell occupation from neighboring occupied cells. For suitable parameter choices, the 
field can vary from a simple space-time cone, developing like a solution to the non-linear 
Fisher equation in a wave-of-advance, to a randomly corrugated surface. A prior sample 
is shown in Fig. [31 Further aspects of the model are illustrated in the supplement. We fit 
this model in Section |6l allowing just one phase, since there is no given phase structure 
for the Bourewa data. In this model, dated specimens are generated in a single interval 
with a spatially varying onset-time and a finishing time which is the same everywhere. The 
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Fig. 3. A realization of the prior of Section [531 for specimen ages 6 (green dots), phase boundaries 
(blacl< dots, for a single phase, M = 1), onset-field (j> (piecewise constant surface of 32 x 13 cells) for 
Q,/?i and /32 distributed with ^ = 10 and B = 1 in Equation (8) and limits L = 2000, U = 3500 years 
BP. (z-axis) years BP, (x,y-axes) 0.25cm units. 



dated-specimen deposition process has a rate which is assumed constant in time when it is 
non-zero, but may have arbitrary spatial structure. Because we condition the analysis on 
the number of dates in each pit and on the number in each phase (there is only one phase), 
there are no deposition rate parameters left to estimate. 

In the remainder of the paper we fit extensions to the single-phase onset field model. 
However, the model in Section [5] has a good balance of realism and simplicity, and the 
further development may be read as model mis-specification testing. Is there really just a 
single phase, and if not, what temporal variation is present? Although there is no evidence 
from stratigraphy for phase structure for the Bourewa data, it is possible that som e phase 
structure is present. This possibility is implicit in the analyses of lNunnI (|2007 . and is 

a second question (besides the settlement hypothesis) of archaeological interest. 

The third model, which we describe in Section [7l is a non-parametric extension of the 
basic phase model of Section |31 with no spatial structure. The number of change points in 
the deposition rate between the beginning and end of deposition is unknown. Since the phase 
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structure is unknown, we can no longer condition on the assignment of specimens to phases. 
We need therefore to model the ratio of deposition rates between phases, in order to weight 
the assignment of dates to phases. Significant variation in the accumulation rate of specimen 
dates is evidence for phase structure. However, although evidence for two or more phases 
is evidence for rate variation, the phases we reconstruct need not be culturally significant. 
We fit the phase structure model of Section [7] to the Bourewa data in Section 17.2.11 and 
get results which are misleading in just this way. We therefore test this third model in 
Section [7.2.21 with an applica t ion to a second data set, the Stud Creek radiocarbon data, 
published by iHoldawav et al.l ( 20021 ). For these data there is a hypothesis that the original 
deposition was interrupted for some time. We reject the single-phase model in favor of a 
three phase model. We are at this point fittin g a change-po int process of unknown piece- 
wise constant rate, with no spatial component. iGreenI (|1995I ) uses a temporal change-point 
analysis of coal-mining fatalities data to illustrate reversible-jump MCMC. Our event times 
are observed with the radiocarbon uncertainty, but the analysis is, at this point, otherwise 
the same. 

We return in Section [S] to the Bourewa data to fit a model with the randomly variable 
temporal phase-structure of Section [7] and the spatio-temporal onset-field of Section [5] In 
this fourth and final model, we start deposition with a spatially varying onset field and follow 
this with an unknown number of phases. We condition on the assignment of dates to pits, 
but not the assignment of dates to phases. The assignment of a specimen to a phase depends 
on the ratios of deposition rates between phases, as in Section [71 We assume that these 
ratios do not vary from pit to pit. This reduces the number of unknown rate parameters 
to a small number, one less than the (unknown) number of phases. The assumption is 
good if, for example, it was true for the original deposition, and the uneven thinning due to 
specimen selection and decay is separable, so that the thinning probability is the product 
of one function of space and another of time. 



2. Related literature 

Discusion of related literature on the Richardson process itself is given in Section 15.21 



2. 1. Methodology 

Maiumdar et al.l ([2005') give a Bayesian statistical framework for the analysis of spatio- 
temporal data with a single change point. They note that the framework is easily gener- 
alised, and illustrate it by fitting a Gaussian process Y{s,t) s e 3?^, t e 3? with a change 
point at time t = to. The process changes from /if + U{x,t) + W{x,t) + e{x,t) a.t t < to 
to fit + U{x,t) + W{x,t) + e{x,t) at t > to. Here e{x,t) is a field of independent normal 
random variables, and U, V and W are Gaussian processes with separable translation- 
invariant covariance functions. These data are observed at N points in space at each of M 
points in time. In their successful simulation study they fit fourteen parameters to data: 
the scalar means and variances of e either side of to (four), the space and time scales of the 
exponential class correlation functions for U, V and W (six) and their three variance-scaling 
parameters, plus to itself. Our own fitting assumes complete spatial independence between 
specimen-deposition events at different pits, conditional on the unknown deposition rate. 
On the other hand, we do not know the deposition rates, or the number of change points, 
and the onset field is a new kind of change-point, since it is a change-field which spreads 
gradually over the region, marking the transition from zero to non-zero deposition rate. 
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llbaiiez an d Simd (j2007l) . w ho wish to model change-points which spread in this way, have 
suggested a modification of Maiumdar et al.l ((2005,:) . though there is to date no fitting. 

We found few apphcation s of Ba yesian or hkehhood-based inference for spatial-temporal 
growth processes. IZhu et al. I (I2OO8') fits a complex, and relatively realistic model to beetle 
presence and count data and tree-mortality data for two beetle species and the tree-health 
indicator simultaneously. The two species compete and spread on a random graph with 
trees at nodes. Time is discrete corresponding to year. The likelihood for the process is 
given as a Markov Random Field up to an unknown normalising constant. The process 
(beetle counts, tree state) is observed directly. Our onset-field is similar: the process is 
spread via neighbors, and immigration. The probability density for the onset-field process 
here is given as an MRF also, though our simpler process is normalisable. However, our 
onset-process is observed indirectly, through the events which follow it, and it is this aspect, 
its role as a change point, w hich distinguishes the applications. 

Moller and et al. (|2008h fit data for a point process Xt of bush-fire locations and dis- 
crete times allowing a spatially structured intensity of the form X{x,t) — Xi{x)X2{t)S{x,t) 
with E{S{x,t)) = 1 and {x,t) € 3?^ x Z. The components Ai and A2 are given in terms 
of linear predictors fro m covariates for spatial and temporal structure respe ctively and 
M0ller and et al.l ( 2008 ) take a shot-noise process for the residual process S{x,t). iDiggle et al 



set this framework up to model spatial-temporal disease da ta, with a unit me an log- 
Gaussian residual process. Because the likelihood is intractable, iMoller and et al 



develo p alternative estima tion procedures, inclu d ing L indsay's composite likelihood. The 
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Diggle et al. ( 2005 ) and M0ller and et al. ([2008') is to model the bulk point process 



ID. In contrast, our data collapse the spatial locations of dated specimens in a pit 
(which has some extension) onto a single point, and events in each pit are modeled using 
a process which is marginally a ID Cox process. Separable intensities play a role in all the 
spatial-temporal statistical analyses we have cited, including our own. 



2.2. Application 

Where there is an interest in recovering unknown phase structure without spatial structure, 
many authors s um the likelihood func tions for individual specimen ages to get a single 
function of age. iHoldawav et al.l (|2008[ ) is a rare example in which this procedure may be 



justified. Under certain condition s, the summed likel ihood is an estimate of the dated- 
specimen deposition-rate function. Bavliss et al.l ( 2007 ) warn against the common practice 
of treating this as a proxy for population density. 

Spatial maps based on the distribution of radiocarbon ages are not usuall y model based, 
but simply a sequence of scatter plots of find locations windowed by age, as iGraham et al. 
([l996 ) . One dime nsional projections of 2D data, such as the compilation of early European- 
Neolithic dates bv lAmmerman and Cavalli-Sforza] (1971), who projected on distance-from- 
Jericho, are commonly regressed. Hazelwood and Steele! ( 2004 ) give a recent overview of the 
same subject, and make a comparison with the settlement of North America, in two spatial 
and one temporal dimension. These authors fit a Fisher wave-of-advance model using a lin- 
ear regression to get the wave speed, and guess plausible values for other model parameters 
based on prior know ledge of generic human demography. In the latest work in this spirit, 
Davison et al. ( 2009[ ) map the Neolithic settlement of Europe, modeling non- interacting ex- 



pansion from two sources, using a Fisher wave-of-advance model with parameters for growth 
rate, carrying capacity, spatially varying 2D advection and a scalar diffusivity. They pool or 
otherwise reduce the earliest dates at a given location to form a central estimate for the local 
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arrival of the Neolithic peoples. They fix model parameters using a hybrid scheme. Initial 
conditions, such as the starting locations and times of the two expansions, are obtained by 
minimizing the root mean square difference between the reduced date at each location and 
the arrival time of the modeled population wavefront at that location. This is nonlinear 
regression. Parameters of the advection-diffusion are fix ed using prior knowledge of generic 
human demography, as in Hazelwood and Steele ( 2004 ). Note that, although these authors 
do not make likelihood-based inference, they are modeling an onset-field. 



Blackwell and Buck! (|2003f ) treat a collection of radiocarbon dates from a number of 



sites excavated under different conditions at different times over northern Europe. For such 
spatially sparse data it is not useful to couple the onset event at each site with a smooth 
spatial field. They group the data into regions, and fit an in dependent temporal model 
for each region. Since the analyses are not spatially coupled, iBlackwell and Buckl (|2003l ) 
page 238 describe their own analysis as "not truly spatiotemporal in nature". In contrast, 
the data we treat comes from a single intensively studied site. Dated pits are dense in the 
modeled region. These properties of the Bourewa data justify some spatial smoothing of 
the field of onset times. 



3. Data, and observation model 



The data are K — 60 uncalibrated radiocarbon ages yi,i — 1,2, .. . K, with associated labo- 
ratory standard errors ai, grouped in iJ = 32 pits. Some of these data were dropped as dis- 
cussed in the final paragraph below. For pit number h = 1,2, . . . , H , Xh = ixi,h, X2,h) gives 
pit coordinates. Let h(i) map data index to pit index. The location a;h(i) — (2^1,11(1)7 2:2, h(i)) 
of the i'th dated specimen is given as a point-location for the pit in which it was found, 
although pits vary in size from 0.25 to 2 meters on each side. Some pits are dug but not 
dated. Dated pits are distributed over a region approximately 200 by 50 meters, with the 
majority of pits in a cluster some 50 meters square. 

Denote by Qi the unknown true age of the i'th dated specimen, measured in calendar 
years before 1950 (Before the Present, BP). Observations of specimen ages are distorted by 
a non-linear, empirically determined, cal ibration function /x, with a s sociated error func tion 
a. In the standard observation model of Stuiver and Polach (1977); Buck et al. 1 1992h . 



with ei ^ N{0,(jf -\- a(QiY) independent random variables. We are omitting some straight- 
forward details: marine and terrestrial material are in fact calibrated using different cal- 
ibration functions and radiocarbon ages mea sured on loca l seashell are subject to the 
s mall constant marine - reserv oir offset given in iNunnI (|2007l ) , which we treat as discussed 
in Jones and Nicholls (|200lt). In our software w e use the 2004 calibration functions of 
McCormac et al.l (j2004f ) and Hughen et al. ( 2004 ) interpolated from their calibrations at 5 
year intervals down to one year intervals, and then approximate Qi as an integer year. In 
our discussion here Y and G are continuous. 

The normal likelihood for parameters <d ^ given data Y ~ y \& 



K 



£(0;y)(xn(<^'+^(^O')"'/'exp - 



(a*(^») - Vt 

2{g1 + a{d. 



with /i and a functions of Q available from a look-up table. The likelihoods of the Bourewa 
specimen ages are graphed in Fig. [T] Certain specimens (blue points) were dropped from 
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the analysis as either insecurely linked to evidence of human presence, or associated with 
activity outside the period of interest. Some pits containing a single omitted specimen were 
thereby removed from the analysis. Data from two pits lying far from the central area were 
dropped as the spatial extrapolation to those isolated pits was deemed too great, leaving 
the H = 24: pits and K = 49 radiocarbon dates represented in Fig. [5J 



4. Conditioning on phase structure 

The posterior distribution specified in this section can be simulated using existing OxCal 
software. However, the notation of this section is needed below, and the numerical results 
will be of interest because, although obtained from standard models in the literature, they 
conflict with the results we get from the onset-field model. 

We begin with a prior model of the process generating dates in a single pit. Dates within 
a pit are grouped into phases, corresponding to some form of strata. The ages Qi,Qj of 
specimen from distinct strata are known a priori to be ordered, so that Qj > Qi if specimen 
j is from a deeper stratum than that of specimen i. Denote by 5'^,^^ = 0, . . . , Af the 
unknown true age of the m'th phase boundary, with < ^fm+ii so that numbering is from 
the surface to the bottom of the pit, and the m'th phase is the age interval ['fm^i, Let 
m(z) give the known phase [ie, stratum) of the z'th dated specimen. Phases may be empty, 
with no specimen taken from the corresponding stratum, as happens if for example there is 
a hiatus in deposition, or possible erosion. Let L and U be fixed lower and upper bounds 
on specimen and phase boundary ages, available as prior knowledge. We use L = 2000 and 
U = 3500 years BP except in Section 17.2.21 The parameter state space for analysis of a 
single pit is 

^l = {{■tp,0) : L < Vo, i>ni-i <ipmm=l, .., M, V'm < U, -f/'mC*)-! <di < ipm(i) i = 1, ■■,K}. 



We follow iNicholls and JonesI ( 2001 ). who get a prior density for 0) by modeling the 



deposition process generating those parameters. The phase boundaries ^i, . . . , are 
the events of a Poisson point process of constant rate A,;, wh ich starts at the onset-time 
'^M and stops at the ending-time ^>q. Nicholls and Jone^ ( 200l[ ) prior density for onset and 



ending times '^m = V'm and v|/q — is pq.m{'4>o,'4>m) = {U — L — {tpM — i^o)) ^ and this 
leads to a joint prior density p{^j\M) at ^' = ^ equal to 

Pi-^l^) ^ TTT—r — h w-i ' 

This density has a uniform marginal distribution for the span statistic, — ~ U{0, U — 
L). This is des irable when the span is of particular interest, as is often the case (though 



Navlor and Sm ith (1988) make p{^p\Al) — 1 favoring greater spans over lesser). The speci- 



men ages 01, ... , Qk are the events of a Poisson point process of piece-wise constant rate 
Xe(t) — Xg^rri for V'm-i <t < ipm- The rates Xg^rn are not of interest, since the archaeologist 
imposes an uneven thinning on specimen, in selecting specimen for dating, on top of possible 
uneven thinning by in situ decay. However, since the analysis is conducted conditional on 
the number M of phases, and conditional on specimen phases m, the conditional density 
for 8|\l/ is 

K 

p(6'|V',m) = ]^(V'mW-l - V'm(i))"^ (2) 

1=1 
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The posterior distribution is 

9\y, m, M) = £{0; e\M, m), (3) 

with 

p{^, e\M, m) = p{0\i;, m)p(V'|M) 

given by Equations ([T]) and ©• 

In this kind of analysis, a site is treated as in effect a single pit. Fitting this model to 
the Bourewa data, for a si ngle phase, M = 1, s o m(i ) = 1 for each i — 1,2, ... ,K using the 
MCMC algorithm given in iNicholls and JonesI (|200l[ )' to sample the posterior distribution of 



Equation Q , we get the marginal posterior distributions for onset age ipM and ending age 
tpo shown in green in the leftmost column of Fig. HI As part of a later model mispecification 





Fig. 4. (top) posterior distributions of onset-times {ipM or 0^^^, y-axis, years BP) by pit (pit-names, 
X-axis, sorted by median onset), (bottom) posterior distributions of ending times V^o- Leftmost his- 
tograms labeled psi_M are (red) ipm for the onset field analysis of Section[6]and (green) ipM,ipo for 
all dates in single phase, as Section m The remaining columns are (red) onset-field interpolations 
<^^^j and (blue) i/^m, V'o from an independent single phase analysis (as Section |4j of the data from 

the pit for that column. 

analysis, we fit the (single phase) model of this section to each pit independently, including 
pits with just one or two dated specimens (for which the posterior distribution of ipi and 
ipQ is dominated by the choice of the upper and lower limits, U and L). These distributions 
are displayed in blue and labeled by pit name in Fig. 21 
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5. A spatial-temporal onset-field model 

The whole process is started at the first-onset time iJjm- This is "first settlement". The 
onset field begins to evolve, as the area in use is enlarged by spreading out, and by new 
arrivals. New phase events at times tpM-i, • • • V'l are step-changes in the deposition rate 
across the then-occupied site, corresponding to changes in the way the then-occupied site 
is used. The deposition process terminates at the ending event, at time f/'o- 

5. 1. The onset-field process 

Let (pc, c = 1, 2, . . . , C be a field cpc < V'm, defined on a rectangular Ci x C2 lattice of 
C = C1C2 square cells. Let 7V(c) give the set of neighbors of the c'th cell. Corner, side 
and interior cells have two, three and four nearest neighbors respectively. Let A/'+(c) = 
Af{c) U {c}. Let {x) map points x in the pit coordinate system to the index of the overlaying 
lattice cell. A modeled pit has a point-like location, though it may lie cross cells. We report 
results for a 13 x 32 lattice of cells each about 2.375 meters square. 

The variable (f)c gives the onset time for deposition at points x in cell c = (.1) . The onset 
time (j)c in cell c is the time of first occupation for cell c. There is one arrival in each cell 
and C arrivals in the course of the process. Let rc.n give the arrival rate in cell c in the 
time interval preceeding the n'th arrival in the process, and let i?„ = rc,„ give the total 
arrival rate in the n'th interval. Let L < iJjq < ijji < . . . ipM < U and for c = 1, 2, . . . , C 
and c' e A/'(c) let positive constants ccc (the rate for immigration to cell c) and /3c, c' (the 
rate for occupation of cell c' from occupied cell c, with (3c,c' = fic',c) be given. We call the 
process parameterised by /3 'migration'. 

The onset-field </) is a realization of the following immigration-migration process: cell c 
is occupied by immigration at rate ac, when cell c becomes occupied, the arrival rate for 
occupation at each of its neighbors c' € M{c) is increased by /3c,c'- A cell once occupied 
stays occupied. When we fit data, we will condition on the first onset-field event occurring 
at ijjM years BP; the algorithm below is for the unconditional case. 

Step 1 Initialize field of rates 

Step 1.1 to ^ V'M, rc^i ac for each c = 1, 2, . . . , C and R\ = rc,i- 
Step 1.2 Set n ^ I. 

Step 2 Simulate arrivals at Exp(ir!„) intervals: 
while Rn > 

Step 2.1 simulate 6n ~ Exp(i?„) and c„ ~ [ri^n/ Rn,r2,n/ Rn, ■ ■ ■ ,'rc,n/Rn); 
Step 2.2 Set tn tn-i — Sn and (/)c„ tn- For each c = 1, 2, . . . , C, set 

rc,n+i ^ rc^n and then overwrite rc„,n+i = and rc,„+i rc,n+i+ Pc„,c 

for each c e 7V(c„) \ {ci, C2, . . . , c„_i}. 

Step 2.3 set n ^ n -t- 1. 

Wo orient the lattice along the beach, and allow the rates for migration along and at right 
angles to the long axis of the beach to be unequal. We fit fields for Uc = a and fic,c' = Pi 
for c ^ c' along the beach and ^c,c' = 02 for c' closer to, or further from, the sea. If a 
is small, and /3i = /32 = /3 is large, typical onset fields show a spread from a single centre. 
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A 2+ ID plot is cone- like. As a increases at fixed /3, the number of centres-of-expansion 
increases, corresponding to isolated settlements which grow and merge. 

The onset-field model is convenient for fitting. The joint density p((j)\a, /3,'>p) of (p is 



c 



p{(f>\a,l3,ip) = rc„,„exp(-i?„(5„), 



from the algorithm. Now 



RnSn — 



'eA/'(c) 



Each cell c contributes to Rn^n a term ac{4>c — and for each neighbor c', a term 
/3c',c(0c' — '/'c)I<^c<<^ /■ Taking the two such terms for each edge gives /3c,c'|</*c — </'c'|- Let 
p{4>N'+ic)) be the rate function of 0c, 



(4) 



so that p{4ij^f+(c„)^Cn) = fc^^n- This is the arrival rate for occupation of cell c in the 
time interval preceding (pc, and 0c is an age BP, so that interval is ages greater than 4>c- 
Assembling these terms gives 



^ / 1 \ 

p(0|a,/3, V') = J^exp log(p(0^+(c)),c) - ac(V'M - 0c) - 2 X! ^c,c' |0c - 0c' | 

c=l \ c'eA/'(c) / 



(5) 



The right hand side is normalized over G (— cx),'0m]'^ so inference for a and /3 is not 
obstructed by an intractable normalization. Also, $ is a Markov Random Field (MRF) 
with cliques {c}, {c, c'}c'g7V'(c) and H^{c). 



5.2. The Richardson growth model, and related models 

The onset-field process is a special case of the contact process with immigration. Cells of a 
contact process, occupied by the immigration and migration processes above, are abandoned 
at per capita rate 7; a binary field ^c(''"), c = 1, 2, . . . , C, r S [0, 00) indicates if a cell is 
occupied at time r. The process is often studied without immigration, conditioned on one 
or more arrivals, or seed s, at time t = 0. The e quilibrium statistics of ^ are studied, 
as functions of /3 and 7. iDurrett and LevinI (|l994[ ) review the process in the context of 



applications in ecology. A contact process in which a single cluster of occupied cells grows 
from a single seed, with no immigration or abandonment (a = 7 = 0), is a Richardson 
cluster-growth model, so the on set-field mo d el abo ve is a Richardson model with multiple 



clusters, ie, with immigration. iRichardsonI (|1973 ) showed that the cluster g rows linearly 



Durrett and Liggett! (1981) extend this. 



and tends in shape towards some fixed shape, 
giving furth er results for the bou ndar y shape. If the proces s is iso tropic, this limiting shape 
is a circle. iHammerslev and iDurrett and LiggettI ( 198l[ ) are sceptical. However, 

the limiting shape is not known at present. The simulation experiments reported in the 
Supplement show that it is at least very nearly isotropic. This is important as we would 
like the process at /3i = /32 = /? to be isotropic, for clarity of interpretation. 
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uses a Richardson process with a Poisson number of randomly located seeds 
to model fields of aluminium grains as random lattice segmentations. The cluster seeds all 
'arrive' at the start time. The process r ealizes a ra ndom segmentation of the lattice into 
regions labeled by the index of their seed. lLe3 (|l999f) conjectures that, for his initialisation, 
the large scale structure of the a = 0, 7 = process is isotropic, and for fixed seeds the 
region borders converge to the edges of a Voronoi tesselation as the cells are subdivided. 

Because our seeds arrive at different times, the onset-field process generates a segmen- 
tation which looks something like a J ohnson-Mehl t esselation at large scale. Johnson-Mehl 
tesselations, described for example in Moller (|l999l) . are random tesselations of the plane 
continuum. Seeds arrive in x [0, c») according to a Possion process in two space and one 
time dimension. Each seed captures a circular domain which grows with a fixed speed until 
it meets the boundary of another domain. Seeds falling into existing domains are deleted. 
This continuum version of the process we are using, with deterministic region growth, would 
be a natural off-lattice model, which we have not considered. 

See the supplement for further discussion of the stationarity and isotropy of the onset- 
field process, and a relation to traveling wave models. 



5.3. The onset-field as a prior probability density for the Bourewa data 

The full prior density for the inference in the next section is 

p{a, P, e, (j), -(MM, m) = p{9\(j), ip, m)p(0|a, -0, /3, ma.x{(f>) = ^/jM)p{ip\M)p{a, P). (6) 

Prior density p{ip\M) is given Equation ([T]). We condition the onset-field on max((/)) = ^Jm] 
in terms of model elements, this means that the beginning of the earliest phase coincides 
with first settlement, and spreading occupation. The modification is 

p(0|a, /3, ip, max(0) = ^m) = a" V') 

for p((/)|a, /?, given in Equation ([S]), and 

(/> e {(/i : G (-00, V'm]'^, max((/)) = Va/}- 

Although we now drop max((/)) = V'm in our notation, all the distributions below which 
involve (j) are conditioned in this way. The onset field fixes the local change point from zero 
to non-zero rate for dated-specimen deposition. This imposes 

V'm(i)-l < < min((/)^^^^^^-|,'0m(i))- 

The bound on the right hand side is the change-point to a non-zero deposition rate, so we 
should replace the density for 0|(^, m) given in Equation ([2]) with 

p{e\^, 0, m) = n — 7X ■ (7) 



mm 

4=1 ' 



The onset field can be thought of as a spatio-temporal mask over temporal phase structure 
which would otherwise apply at all locations. 

Our prior for a and /3 is subjective. We expect just a handful of pure immigration 
events in the interval [L, U], so we need E{a) > 1/C(U — L). The "speed" of the expansion 
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is controlled by f3 and for small a is between 2/3 and 3/3 [cells/year] (see supplement for 
further discussion on this point). If the expansion is to cover the site in the interval [L, U] 
then we need 

> max(Ci,C2) 
^[P) ^ 2(L-C/) ■ 

Rate parameters a and /3 are otherwise positive so the maximum entropy priors are 

a-Exp(A/iV([/-i)), /3 - Exp(Bmax(Ci,C2)/2(L- [/)). (8) 

Prior simulation showed 1 < A < 20 and 0.5 ^ B < 2 generated a range of onset-fields 
with plausibly varied temporal "roughness" (increasing with a) and "peak to valley depth" 
(decreasing with /3). 

The onset-field process has a boundary effect. In realizations of the process, boundary 
cells are typically occupied later than cells in the interior. Since 

f p(0|a,/3,V)# = l, 

we can differentiate with respect to etc to get the moment identity 

E(0c)=V'm-e(^^^ y (9) 

Since corner and side cells have fewer neighbors than interior cells, the sum p{4>js/+(c)) in 
denominator Equation ^ is distributed over a smaller total there. Differentiating again 
with respect to etc' gives 



so (f)c + p{4>j\f+(c))^^ ^-nd 4'c' + p{4>N'+{c'))~^ s-re uncorrelated, and we have verified this 
numerically, as a check on our code. The field of values (jj^ + p{4>j\f+{c))~^ is homogeneous 
and isotropic to 2nd order, regardless of boundary effects. Further field identities are given 
in Section 15.11 of the Supplementary material. 

We might pad the site with a large border outside the area where pits are concentrated, 
or, considering Equation ([9]), raise ttc for c the index of a boundary cell. However, simulation 
of p(a, /3, 01V') showed that these approaches give a distribution of onset fields which does 
not represent prior belief for this site. If there are just a few immigration events, and the 
field is extended a great deal, then the immigration events tend to occur in the extension, 
and the pit-area is settled by migration. In this class of models, the interval between the 
onset event at time V'm (people arrive on the beach) and occupation at cells covering pits 
(the settlement reaches the excavated area) can be improbably large. By accepting some 
spatial inhomogeneity in the prior onset-field at the boundary we get a prior distribution 
better resembling prior belief. 

Pit locations are a second source of spatial inhomogeneity. The location of a pit with 
at least one dated specimen is informative for the onset field, without the radiocarbon date 
itself, because the deposition process must act for some finite time wherever there are dated 
specimens. This constrains ^ > tAq for = 1, 2, . . . , iJ, but not elsewhere. We go further 
and assert as part of the observational data that the deposition process acts for some finite 
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time at each point over the windowed region of this site, so ipo < (j^c ipM- Conditional 
on this assertion, the locations of the dated pits are not informative. As a side-effect, the 
marginal prior distribution of the span ipM — "00 depends on the priors for a and /3. Short 
spans are excluded at small /3 as the field needs time to cover the site. As a second side 
effect, the boundary effect mentioned in the previous paragraph is slightly reduced. 

A single sample from the prior p(9, -0, 0, a, /3) given above, for M = 1 is shown in Fig. |31 
The lattice is 32 x 13 and A = 10 and B = 1. Cells are 2.375 meters on each side, and 
the field is piecewise constant by cell. In Fig. [S] we show the prior mean and standard 
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Fig. 5. The prior mean onset field (left) and the prior standard deviation of the field (right), in the 
single-phase onset-field model of Section l531 The colors in the color bar give years BP. 

deviation of the field estimated marginally for each cell. Further figures illustrating the 
onset-field prior and the priors for ijj and 6 are shown in the Supplement. We see residual 
inhomogeneity in the prior of about 150 years from centre to corner. The marginal prior 
standard deviation of 0c at cell c is of order 250 years. This includes random offset variation 
from ipMi the initialisation-time. The standard deviation of the elapsed time ipM — fpc is of 
the order of 200 years. 



6. Fitting thie onset-field for a single phase 

The posterior density for deposition in a single phase [M = 1), modeled as Sectional with 
the onset-field of Section [S] conditioned to satisfy e [0O: 0J\/]'^ is 

p{a, /3, 0, 0, e\y, m, M) cx l{e- y)p{a, /3, 0, 0, 0|m, M). 

For p{a, /?, 0, -0, 0|m, M) see the definitions at Equation 

We fit this model by M etropolis-Hastings MCMC simulation. The updates are those 
of iNichoUs and Joned ( 2001 ). New parameters a and (3 are updated via a scaling proposal 



z ~ /7(l/2, 2), a' = za, /3' = z/3 and jointly with the onset- field in an update with proposal 
distribution p{(j3\i}})p{a, /3). The algorithm is effective when the latter, which updates a, the 
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two components of j3 and the field 0, has a reasonable success rate. We have a second onset- 
field update with proposal distribution p[(l>\a, fijip), which conditions on the immigration 
migration rate parameters. The two </)- updates generate new onset-field values at every 
cell, and are simulated using the algorithm of Section 15. f I The cj) field has a large spatial 
correlation, with structure on the same scale as the site itself, so single-cell MRF updates 
proved to be very inefficient. The condition cj) G [tpo,ipM]^ is implemented by rejecting 
proposals at the acceptance stage of the MCMC update which do not satisfy the condition, 
rather than by making proposals from the conditional distribution. 

An image of the posterior mean onset-field is shown in Fig. [B] There is a single peak. 
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Fig. 6. The posterior mean onset field (left) and the posterior standard deviation of the field (right), 
in the single-phase onset-field model of Section[6l 

Compared to the prior field. Fig. O the range of onset times (the depth of the field) has 
gone up, and the standard deviation, shown in the same figure, has dropped. Typical values 
(across cells) of the standard deviation of the elapsed times ipM — (f>c drop from values of 
order 160 years in the prior to values of order 60 years in the posterior. Fig. [J] shows 
the posterior distribution for first settlement, ma,x{(f>) = tpM (with M — 1, leftmost red 
histogram) at around 3100 BP. By contrast the single phase analysis puts this event at 
around 2900 BP, since it is pulled down by pits with younger dates. Fig. [7] shows the sets 
of cells 

green = {c : Pr(V'M - 0c > T*|y) > p*, c = 1, 2, . . . , C}, 
blue - {c:Pr(^M-0c <r*|y) >p*,c= 1,2,...,C}, (10) 
red = {1, 2,..., C}\ (green U blue), 

with T* = 150[years] and p* — 0.8 and we estimate the probabilities using MCMC simula- 
tion of p{a, /3, (j), ip, 9\y, m, M). Green cells are all settled within 150 years of first arrival, 
and blue cells are all settled more than 150 years after first arrival, with marginal cell 
probabilities all at least 0.8. A visually interesting threshold T* was set by searching, so 
there is a hazard akin to multiple testing. However, when we repeat this exercise on prior 
simulations no single threshold T* makes both 'green' and 'blue' non-empty at p* — 0.8. 



Fig. 7. Support for spread: (left) the posterior probability that V'ai-</'c < 150 (respectively, 0c > 
150) is greater than 0.8 for cells c in the green (blue) region, in the single-phase onset-field model of 
SectionjH (right) the corresponding image for the prior distribution of the onset field. Results for the 
multi-phase onset-field model of Section|8]are similar. 



Independent pit-by-pit estimation of the onset and terminal events for a phase model 
of the kind described in Section U] (ze, without onset field) is usually pointless if there are 
no more than two dated specimen in each pit. The marginal posterior density for the onset 
event has a tail which decays like 1/ipM- As a consequence, results are sensitive to the 
choice of the bound U. This is unsatisfactory, as U is usually just a very conservative 
upper bound on tpM, the first-onset. The blue histograms in Fig. |4]show these pit-by-pit 
analyses. The red histograms in Fig. 2] show the posterior distributions of 4>(^^^i^-fy the age 
of the onset-event at pit h — 1,2, ... , H . The onset field spatially smooths the onset times 
in the pit-by-pit analysis, concentrating the onset time distributions within the support of 
the pit-by-pit distributions. 

Where the red onset-field and blue single-pit histograms in Fig. factually conflict, there 
is possible evidence for model mispecification in the onset field. This occurs at pits '4' and 
'X20'. Because these pits have just two dates each, the evidence for mispecification is not 
compelling. In pit 'X20' we have two consistent, relatively late, dates (see Fig. [1]) while 
the onset field interpolates an earlier onset. However, the red and blue histograms at 'X20' 
do overlap a little, reflecting the fact that we may by chance have chosen a couple of late 
specimens from a population of specimens in accord with the onset field estimate. 

We are assuming that there was no abandonment of areas once settled, up to ipQ, the 
end of deposition associated with the culture of interest. The lower plot in Fig.[l]shows the 
all-in-one phase (green) the pit-by-pit (blue) and onset-field (red) estimates for -00. The 
single-pit analysis in pit '4' favors an earlier abandonment. Looking at Fig. [T]we see two 
early dates and no later dates, so this might be abandonment. On the other hand pit '4' is 
adjacent pit '1', where there is no evidence for abandonment. 

We have varied the priors for the immigration a and migration-speed /? parameters 
which control the shape of the onset field, and checked that results a and /3 are robust 
to reasonable variation. The posterior distribution of the migration rates /3i and P2 and 
immigration rate a for model fitting with different priors, scaling the prior mean rates by 
2, and by 1/2, are shown in Fig. [SI The prior simulations (left in Fig. [51 colored green, solid 
and dashed respectively) for /3i and (^2 coincide. The distributions of the two rates differ 
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Fig. 8. i-axis 25cm units/year (ie, rate multiplied by cell side length), y-axis frequency. (Left) the 
posterior distributions of migration rate /3i (solid, 'b1') and 132 (dashed, 'b2') in six independent single- 
phase onset-field models with the A and B hyperprior parameters of Equation |8) given by [A, B) = 
(10,1) (no symbol), {A,B) = (20,2) (-i-), (A,B) = (5,0.5) (triangle), {A,B) = (20,0.5) (5-star), 
[a, B) = (5, 2) (6-star) and [A, B) = (10, 1) on a 64 x 25 lattice (circle). Prior simulation at [A, B) = 
(10, 1) has square-symbols, (right) corresponding posterior distributions for a. 



in the posterior, and are not shifted by the changing priors with doubled and halved rates. 
Expansion of the settlement moved more rapidly along than up the beach. The immigration 
rate is more sensitive to the prior within the range considered. 

In order to measure support for spread from a single centre, we count the number of 
arrival-events, 

C 

^('^) ^ y^Il0c>0./Vc'eA^(c) 

c=l 

in a field. For example, the prior sample in Fig. [3] has V — 2 arrivals. The posterior to 
prior odds for = 1 in a run with A = 10 and B = 1 are greater than one and the odds 
decline steadily with increasing V, so the data push the distribution of V towards fewer 
arrival events. Our analysis of the model with hyper-prior parameters (A,B) = (10, 1) on 
a 64 X 25-cell lattice gave essentially identical results. The Bayes factor ioi V — I against 
V = 2 is only slightly greater than one. Very small peaks mask the big picture of Fig. [T] 
Supporting figures are presented in the Supplementary material. 

We have checked that the effects we see are not imposed by the prior distributions we 
are using. We made a single run with the lattice size doubled to 64 x 25 on the same region. 
Results for the default parameter values A = 10, B = 1 were very similar to those above 
(see Supplement). We simulated synthetic data under the single-phase model of Section 2] 
and fit it with the single-phase onset field model. We do not detect a settlement process 
where there is none. There is no threshold that splits the cells at p* = 0.8, that is, one of 
the green or blue sets is empty for each threshold T*. Also, the posterior to prior odds are 
flat with increasing V. Supporting flgures are presented in the Supplementary material. 
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7. Recovering phase structure without spatial structure 

We now extend our analysis to the case where the number of phases, and the assignment of 
specimen to phases, is not known. In this section we make a purely temporal change-point 
analysis for the unknown piecewise-constant dated-specimen deposition rate. 



7. 1. Prior model for assignments of specimens to pliases 

The number of phases M, and the map m from data index to phase, are now random 
variables. For m = 1,2, ... , M, let 7^m(m) = card {i : m(i) = m,i = 1,2 ... , K} give the 
number of dates in the m'th phase, and 

Pmi^O,^) = — ig ^ ■ -. 

give the probability for specimen i to be assigned to phase m(i) = rn. The prior probability 
for assignment m is a function of ratios of the unknown dated-specimen deposition rates 
Ai, A2, . . . , Am ) 

M 

p{m\Xe,^)=l[p^{Xg,i^)'''"^'^^ (11) 

m— 1 

The right hand side of Equation ([TT|) has a multinomial form, without the Kl/YlmKml 
factor, because the left hand side is the probability for an assignment m of distinct specimen 
to phases and there are Y[m Km^./Kl such assignments. The posterior distribution is now 

p{M, i;, Xe,m, e\y) cx i{e- y)p{M, ^, Xg,ni, 0), (12) 

with 

p{M, ^, Ae, m, 0) = p{M)p{i;\M)p{Xe\M)p{m\Xe,^)pi0\ij, m). 

If we condition on AI and m in Equation (jl2p we get Equation ([3]). Of these factors, 
p{0\tjj, m) and p{'ip\M) are given by Equations dD) and ([2]) and p{in\Xg,ip) in Equation (fTTjl . 
The prior distribution of M (in fact M — 1, the number of phase boundaries between onset 
and ending) is Poisson. If we took our prior straight from the deposition-process Aij, of 
Section!?] we would have M — 1 ^ Poisson(A^ (■(/'a/ — ipo))- In fact we take M — 1 Poisson 
distributed, but fix the mean at log(2) so that the prior probability that M = 1 is equal to 
one half. 

Our prior for the dated-specimen deposition rates Ae\M is Xe^m ~ Exp(l) for m = 
1, 2, . . . , M. The common scale of the rates is irrelevant as the rate-prior is important only 
insofar as it decides the prior for the phase allocation probabilities {pi,p2, . . . ,pm)\M . In 
our setup, for given ip, dated specimens are a priori more likely to belong to long-lasting 
phases. When — ipm-i are equal for m = 1,2,..., A/, we get (jii,p2, . . . ,pm)\iP ~ 
Dirichelet(l, 1, . . . , 1). Simulation of the prior shows that, allowing for variation in both tp 
and Xg, {pi,P2, ■ ■ ■ ,Pm)\M ~ Dirichelet(l/2, 1/2, . . . , 1/2) is a good approximation, for at 
least M < 10. In the supplement we show that, if we accept this Dirichelet approximation, 
then the array of numbers of dates in phases, {Ki,K2, . . . , Km)\M, has a multivariate Polya 
distribution, 

piK„K2, . . i^MlM) . fr r(x™ + 1/2) 



n™=ii^™!r(if + M/2) 11 ^ 
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Mosimannl (jl962l) gives the moments. The mean is E{Km) = K/M, as we expect. The co- 
variance matrix is a scalar multiple of the covariance matrix of a Multinomial (1/M, . . . , 1/Af ) 
array and the two distributions have equal correlation matrices. We conclude that the sim- 
ple choice Xg^m Exp(l) seems to lead to no unacceptable marginal prior distributions, and 
other aspects of the prior framework are motivated by the deposition-model framework. 



7.2. Estimated phase structure for tfie Bourewa and Stud Creek data sets 

We fit the random phase structure model of Section 17.11 using MCMC. We extend the 
updates of NichoUs and Jone^ ( 2001 ) to include moves to add and delete phase boundaries, 
and to move specimen between phases (by varying a specimen age or a phase boundary 
age) . When we add a phase boundary into an existing phase, the state dimension increases 
by two, as we need a deposition rate for the new phase. We have the Metropolis Hastings 
G reen se t up for MCMC in something very like the original variable-dimension application 
of lGreenI (Il995h . 



7.2.1. Bourewa 

When we fit this model to the Bourewa data, we are ignoring spatial structure. The posterior 
mode has 2 phases. When we plot the locations of specimens assigned in the mode to phases 
one to three on separate graphs, we get a picture which resembles Fig. [TOl The plot is 
omitted, for brevity, and because it is misleading, as we now explain. 

The analysis seems to show expansion from a centre, with the distribution in specimen 
locations spreading out as we move from phase to phase forwards in time. Do we need an 
onset-field analysis if the phase assignment model shows such a clear effect? First, and most 
important, central pits have more dates, so they may display more early dates. Secondly, 
if the settlement did expand, and the specimen are selected for dating uniformly in space 
and time, then we get more dates when the settlement is larger, and we get a good fit to 
the rate with a staircase of physically spurious phases. 



7.2.2. Stud Creek 

The phase-assignment analysis is useful for measuring support for alternative temporal 
phase-structure models, as we now illustrate. 

Stud creek is a small stream system in far western New South Wales, Australia. The area 
was excavated between 1996 and 1998, and the remains of 72 hea rths were recorded, of w hich 
28 yielde d one d ated specimen each. These data are reported in iHoldawav etall (|2002[ l and 
[Hold awa v et al.l ((2008 1. who argue that, in our terms, the dated hearths are a random 
sample from the population of previously buried hearths in the study area. Although there 
has been some localized erosion, hearths are scattered over a wide area, with hearths of 
markedly different ages located adjacent to one another. Their likelihood is graphed in 
Fig. [HI The archaeologists conject that the site was in use in two separate phases, with a 
hiatus associated with climate change. This question is answered in lHoldawav et al. I (|2002h . 
We use these data to illustrate our me thods for recovering ar chaeological phase structure. 
Exploratory analysis by the authors of iHoldawav et al.l (|2002r ) showed no spatial structure 
in the specimen ages. 

In order to estimate the phase structure of the Stud creek data we fit the phase-structure 
model of this section to the data (with limits L — and U — 2000 years BP). When we 
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Fig. 9. (black outline) likelihoods for tiie Stud Creek data of 'Holdawav et al.' /200^; (black, gray 
filled) marginal posterior distributions of ages shaded by phase assignment (black) phase 1 , and 
(gray) phase 3 and co nditioned on that assignment, (y-axis) years BP, (x-axis) specimen label per 
lHolclawavet"anj2002l) . 



fit a phase structured model to determine phase, we are making a model comparison over 
phase models in a discrete set. The prior distribution over phase models allows any number 
M of phases, and an arbitrary assignment m of specimen to phases. In the past, model 
comparison of this kind has been made for a very sm all number of alternative models, 
typically, two. For example. Ijones and Nicholld ( 2002f ) estimate Bayes factors in order to 



weigh support for two models of a small radiocarbon date data set with (M = 6, m = 
(1,2,3,4,5,5,6)) and without (M = l,m = (1, 1, 1, 1, 1, 1, 1)) stra tigraphic phases, from 
K = 7 dated specimen. For the Stud creek data, iHoldawav et al.l (2002) compare model 



>1 = {M = 3, m = (1 X 13, 3 X 15)} with model B = {M = 1, m = (1 x 28)} for the K ^ 28 
stud creek dates. Model A has an empty phase (phase 2) for hearth construction explaining 
the obvious step in Fig. [HI We will make two model comparisons, between models A and B, 
and between models A' = {M = 3} and B' = {M = 1}. The A' v. B' comparison weighs 
the evidence for three phases, as opposed to one, without specifying which specimens are in 
which phase. The models in the A v. B comparison specify specimens' phase assignments. 

We begin with the A' v. B' comparison. The Bayes factor is the ratio of the posterior 
to prior odds ratios, 

^ ' ^ Pr(B'|y) Pr(A')' 

We estimate Pr(A%) ~ 0.20 and Pr(B'|y) ~ 0.41 as the proportion of states Xt with M = 3 
andM = 1 phases respectively, in MCMC simulations with = (Af(*), A^*\ m^, 0(*)) 

for < = 1, 2, . . . , T and convergence X^^*^ — ?> "07 ^e, m, 9\y) to the posterior. The quanti- 
ties Pr(A') ~ 0.12 and Pr(B') = 0.5 are Poisson(M- 1; log(2)). We find B[A' , B') ~ 2.03 so 
there is very mild support for 3 phases over one. The reason for this weak positive support, 
is that model A' is a large class of models, including many very improbable assignments m 
of specimens to its three phases. The mean likelihood in the A'-prior is lowered by these 
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models. 

Since there is just one way to assign K specimen to M — 1 phases, B = B' and we have 
Pr(i3|?/) ~ 0.41 and Pr(i?) ~ 0.5 as before. The posterior probability for A was large enough 

to estimate simply, at Pr(^|j/) ~ 0.08. The prior for A is Pr(Af, m) — Pr(m|il/) Pr(Af ) 
with Pr(M = 3) ~ 0.12 and P(m|M) calculated in the supplement as Pr(m|Af) ~ lO^^". 
The Bayes factor is then B{A, i?) ~ 8 x 10^ so the support for A over B is overwhelming. 
The red and blue histograms in Fig. [5] show the posterior distribution for the hearth ages, 
conditioned on model A, whilst the colors show the maximum a posteriori phase assignment. 

Inference for A/, m is useful for exploratory analysis. When we fit the Stud-creek data, 
in the (A', i?')-analysis, the hiatus model A is thrown up as worth closer inspection in a 
search over many models. We looked at other priors on M and m; one of these is discussed 
further in the supplement. 



8. Fitting the onset-field for multiple phases 

If a settlement forms, spreads, and from time to time undergoes dramatic changes in char- 
acter, so that the deposition at all settled locations moves into a new phase, we should 
fit a model with local deposition switched on by the spreading onset-field, cut across with 
an unknown number M of site-wide phase boundaries. This is the onset-field model de- 
scribed in Section 15.31 augmented with the prior weighting for Af, the relative deposition 
rates \g = (Ae,i, Ae,2, ■ • ■ , Ae,A/) and the specimen to phase map m described in Section [T] 
The posterior density is 

p(a, /3, (j), M, ■0, Ag, m, 6*1^) cx £{9; y)p{a, l3, cj), M, -0, Ag, m, 6*), 

with 

pia, p, 0, M, 0, Ae, m, 6) = p{a, P)p{M)p{^j\M)p{X0\M)p{(b\i^, a, /3)p(m|A0, V, <^M^|V^, m, 0). 

In order to clarify the state, we explain how to simulate the prior. We simulate M ^ 
Poisson(log(2)) and ip according to Equation (UJ. The onset field is simulated using the 
algorithm in Section f5.ll with a and the two components of j3 independent and distributed 
as Equation ([S]). These steps are repeated until max(^) > ipQ. We have now to assign 
the specimen to phase map m and specimen ages 0i, 02, • • ■ , ^k- The probability, p„i,i say, 
for specimen i to be assigned to phase m depends now on the local onset field, since the 
deposition rate at a pit before onset is zero. Deposition in phase m does not occur at pit fi 
at ages before (j)c(xh)- Let 



A, 



= max(0, {min{(f>f^^^yiprn) - i^m-i)) 



give the span of deposition in phase m at pit h. The conditional single-specimen phase- 
assignment probability is 

_ Ae,mAm,h(i) 

Z-^rn' — l '^0 ,m' ^m' ,h{i) 

independently for each i — 1,2,...,K and the conditional probability for any particular 
phase assignment m is 

K 



1=1 
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Finally, the unknown true specimen ages Oi, i = 1,2, . . . , K are uniformly distributed 
in ipin{i)-i < Oi < min(V'm, as before. The MCMC simulation of the posterior 
p{a, P, 0, M, ip, Ae, m, 6*1?;) combines the MCMC updates of Sections[n]and01 Each MCMC 
simulation involving onset-fields takes at least 24 hours to run. 

The posterior onset-field distribution we obtain fitting this second random-phase model 
is very similar to the distribution we had in the single phase onset-field model. The figures 
corresponding to Figures [SJ [5] and Figure [7] appear in the supplementary material. 

The modal number of phases is one. However, the prior weights in favor of a single 
phase. We can estimate marginal likelihoods, 

e,„ = Pr(M = m|j/)/Pr(M = m) 

for M = m phases, averaged over all other parameters. The ratio emi/e„i2 is the Bayes 
factor for the model comparison of M = mi against M — m2 phases. We find ei ~ 0.8, 
62 — 1 and 63 ~ 1.3, so the single phase model of SectionlHlis not contradicted. 

In order to visualise the settlement directly on the data, we look at the distribution of 
dates when there are three phases (supported by the largest reliably estimated marginal 
likelihood) . The scatter of the specimens in each phase is shown in Fig. [TOl The scatter 
plots shows a clear spreading trend as we move from phase to phase. 
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Fig. 10. The scatter of specimen by phase, in the posterior mode assignment m conditional on three 
phases, in a random-phase model with onset-field, (left) Phase 3 (middle) Phase 2 (right) Phase 1. 
Axes are specimen excavation coordinates. Time increases left to right from phase 3 to phase 1. 
Point-size is proportional to posterior probability for the given phase assignment. 



9. Conclusions 

We have fitted four models to the data: a simple single-phase model without spatial struc- 
ture (Section|3]), a single-phase model with an onset field (SectionlH]), a random-phase model 
with no spatial structure (|7.2p . and a random phase model with onset-field (Section [S]). The 
single-phase onset-field (second model) models the unknown specimen dates at each pit as 
uniformly distributed in time between the local onset and the end-of-deposition event. The 
single-phase model without onset-field (first model) applies too much shrinkage to the earli- 
est dates, as the total deposition rate in a spreading settlement is not initially large. This is 
visible in Fig. [4] where the single-phase estimate for ^jj^ is shifted by 200 years towards the 
present. The single-phase onset-field model estimate 3000-3200 BP will be more reliable. 
The onset-field priors do not overwhelm the data. 
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In the random-phase model (third model), phases are the pieces of the piece- wise con- 
stant specimen-age intensity. Like the first model, this model treats the site as a single 
spatially unstructured pit, ignoring variations in the intensity which might be due to onset- 
time varying from place to place. It will detect the low initial deposition rate and not 
over-shrink. However, it is unlikely that the phases recovered by the random phase model 
correspond to physically distinct phases, in the archaeological sense. The evidence for phases 
is variation in the intensity of the dated-specimen ages. Such variation might be caused 
by any one of many time-varying selection processes which thin dateable material down 
to dated specimen. However, the random phase model does pr ovide, at least heuristically, 
for rate variation, in the spirit of Blaa uw and Christen ( 20051 ) , who mod el the age-depth 
relation with a variable number of accumulation sections, and Karlsbere ([2OO6I . who pa- 
rameterises rate variation within a fixed number of phases. The Stud Creek hiatus example 
has the unusual property that confounding processes may be absent. 

The random-phase onset-field model (fourth model) adapts to heterogeneity in spatial 
and temporal deposition rates. It is rather complex. However, it functions as a model- 
mispecification check on the constant-rate hypothesis of the simpler single-phase onset-field 
model, which it supports, as it allows a single phase. 

Our analysis (Fig. [?]) shows that the human-associated deposition did not commence at 
each pit location at the same time. The posterior mean onset field Fig. [5] shows the pattern 
of expansion from a central location in the onset of deposition at Bourewa. Ancient human 
settlement generating dated specimen begins in the area between pits '1','4' and 'X6' and 
takes some 150-200 years to spread some 40-50 meters along the beach to pits '2', 'AlA' 
and 'AID'. 
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